Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Cassandra db schema on session initialization #5922

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

akstron
Copy link
Contributor

@akstron akstron commented Sep 2, 2024

Create Schema (if not present) on Session Initialization

Once a session is established with cassandra db, the added code parses the template file containing queries for creating schema and create queries out of it. Post which it executes those queries to create the required types and tables.

Which problem is this PR solving?

Resolves #5797

Description of the changes

  • The PR includes the following changes:
    1. Embedding template files into binary
    1. Creation of database schema in initialization steps once session to database is established.

How was this change tested?

  • Schema rendering is being tested with unit test.
  • bash scripts/cassandra-integration-test.sh -s 4 v004 v2

Checklist

plugin/storage/cassandra/factory.go Outdated Show resolved Hide resolved
plugin/storage/cassandra/factory.go Outdated Show resolved Hide resolved
plugin/storage/cassandra/factory.go Outdated Show resolved Hide resolved
plugin/storage/cassandra/factory.go Outdated Show resolved Hide resolved
plugin/storage/cassandra/factory.go Outdated Show resolved Hide resolved
plugin/storage/cassandra/factory.go Outdated Show resolved Hide resolved
plugin/storage/cassandra/factory.go Outdated Show resolved Hide resolved
plugin/storage/cassandra/factory.go Outdated Show resolved Hide resolved
plugin/storage/cassandra/factory.go Outdated Show resolved Hide resolved
@akstron akstron force-pushed the create-database-scheme-cassandra branch 2 times, most recently from 69275fb to bcad4c0 Compare October 25, 2024 13:34
@akstron akstron marked this pull request as ready for review October 25, 2024 13:52
@akstron akstron requested a review from a team as a code owner October 25, 2024 13:52
cmd/jaeger/config-cassandra.yaml Outdated Show resolved Hide resolved
pkg/cassandra/config/config.go Outdated Show resolved Hide resolved
pkg/cassandra/config/config.go Outdated Show resolved Hide resolved
plugin/storage/cassandra/schema/schema.go Outdated Show resolved Hide resolved
@akstron akstron force-pushed the create-database-scheme-cassandra branch 2 times, most recently from 90368b1 to afc786d Compare October 28, 2024 10:50
pkg/cassandra/config/config.go Outdated Show resolved Hide resolved
pkg/cassandra/config/config.go Outdated Show resolved Hide resolved
pkg/cassandra/config/config.go Outdated Show resolved Hide resolved
plugin/storage/cassandra/schema/v004-go-tmpl-test.cql.tmpl Outdated Show resolved Hide resolved
plugin/storage/cassandra/schema/schema.go Outdated Show resolved Hide resolved
@akstron akstron changed the title Create Cassandra db schema on session initialization [WIP] Create Cassandra db schema on session initialization Nov 9, 2024
Copy link

codecov bot commented Nov 9, 2024

Codecov Report

Attention: Patch coverage is 53.33333% with 63 lines in your changes missing coverage. Please review.

Project coverage is 96.15%. Comparing base (295146c) to head (2c8de88).
Report is 23 commits behind head on main.

Files with missing lines Patch % Lines
pkg/cassandra/config/schema.go 36.70% 50 Missing ⚠️
pkg/cassandra/config/config.go 76.78% 11 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5922      +/-   ##
==========================================
- Coverage   96.43%   96.15%   -0.29%     
==========================================
  Files         355      356       +1     
  Lines       20157    20290     +133     
==========================================
+ Hits        19439    19510      +71     
- Misses        530      589      +59     
- Partials      188      191       +3     
Flag Coverage Δ
badger_v1 8.20% <0.00%> (-0.11%) ⬇️
badger_v2 1.65% <0.00%> (-0.03%) ⬇️
cassandra-4.x-v1 14.48% <22.96%> (+0.09%) ⬆️
cassandra-4.x-v2 1.59% <0.00%> (-0.03%) ⬇️
cassandra-5.x-v1 14.48% <22.96%> (+0.09%) ⬆️
cassandra-5.x-v2 1.59% <0.00%> (-0.03%) ⬇️
elasticsearch-6.x-v1 18.36% <0.00%> (-0.25%) ⬇️
elasticsearch-7.x-v1 18.43% <0.00%> (-0.25%) ⬇️
elasticsearch-8.x-v1 18.61% <0.00%> (-0.25%) ⬇️
elasticsearch-8.x-v2 1.64% <0.00%> (-0.03%) ⬇️
grpc_v1 9.32% <0.00%> (-0.13%) ⬇️
grpc_v2 6.89% <0.00%> (-0.10%) ⬇️
kafka-v1 8.76% <0.00%> (-0.12%) ⬇️
kafka-v2 1.65% <0.00%> (-0.03%) ⬇️
memory_v2 1.65% <0.00%> (-0.03%) ⬇️
opensearch-1.x-v1 18.49% <0.00%> (-0.25%) ⬇️
opensearch-2.x-v1 18.49% <0.00%> (-0.25%) ⬇️
opensearch-2.x-v2 1.64% <0.00%> (-0.04%) ⬇️
tailsampling-processor 0.46% <0.00%> (-0.01%) ⬇️
unittests 95.03% <46.66%> (-0.32%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@akstron akstron changed the title [WIP] Create Cassandra db schema on session initialization Create Cassandra db schema on session initialization Nov 9, 2024
@akstron akstron changed the title Create Cassandra db schema on session initialization [WIP] Create Cassandra db schema on session initialization Nov 10, 2024
@akstron akstron marked this pull request as draft November 10, 2024 05:21
…ution for initialize database

Signed-off-by: Alok Kumar Singh <[email protected]>
akstron and others added 5 commits November 21, 2024 23:47
Signed-off-by: Alok Kumar Singh <[email protected]>
Co-authored-by: Yuri Shkuro <[email protected]>
Signed-off-by: Alok Kumar Singh <[email protected]>
Co-authored-by: Yuri Shkuro <[email protected]>
Signed-off-by: Alok Kumar Singh <[email protected]>
Signed-off-by: Alok Kumar Singh <[email protected]>
Signed-off-by: Alok Kumar Singh <[email protected]>
if !c.Schema.CreateSchema {
return nil
}
cluster, err := c.NewCluster()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check error

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, all this code effectively repeats NewSession, the only difference is the keyspace. Can you move it into a helper createSession(keyspace string)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation of newSessionPrerequisites kind of uses a hack to override cluster.Keyspace which was set in c.NewCluster(), so that the connection can be established without keyspace.
So, you are suggesting enclosing this override logic in createSession(keyspace string)?

Or something like I create a copy of the Configuration with different keyspace to do the work, so that we don't set cluster.keyspace = "" after creating it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make the new function to depend on the config argument, and pass whichever config is needed in each instance, including cloning it to override keyspace

@@ -211,6 +273,23 @@ func (c *Configuration) String() string {
}

func (c *Configuration) Validate() error {
govalidator.CustomTypeTagMap.Set("cassandraTTLValidation", func(i interface{}, _ interface{}) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need custom functions, can the conditions not be expressed directly via tags?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we must have custom logic then just write it directly, with proper error messages - right now you are just returning a Boolean, so at best validator can say "property x did not validate".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need custom functions, can the conditions not be expressed directly via tags?

Couldn't find such validations here: https://github.com/asaskevich/govalidator

if we must have custom logic then just write it directly, with proper error messages - right now you are just returning a Boolean, so at best validator can say "property x did not validate".

Fixed it.

if [ "${SKIP_APPLY_SCHEMA}" = "false" ]; then
apply_schema "$schema_version" "$primaryKeyspace"
apply_schema "$schema_version" "$archiveKeyspace"
fi

if [ "${jaegerVersion}" = "v1" ]; then
STORAGE=cassandra make storage-integration-test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this going to distinguish the two runs with SKIP_APPLY_SCHEMA=true/false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't quite get the question.

Copy link
Member

@yurishkuro yurishkuro Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. the CI workflow is not providing / varying SKIP_APPLY_SCHEMA, so by default it's false and the schema is being pre-created as before, i.e. none of your new code is being CI-tested
  2. even if the CI workflow was running a matrix with SKIP_APPLY_SCHEMA varied true/false, the actual test is run with the above command that doesn't tell the test to pass any additional arguments to the code, meaning that the CreateSchema field in the config will still be false and won't execute your schema creation logic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that we can add the create: true in the config-cassandra.yaml, or do we still want to run integration tests with create: false?

For the workflow we can add the -s flag in the command based on skip-apply-schema : [true, false] in the matrix strategy.

Does this sound good?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want to have integration tests that do both: (1) script-created schema (as we do today) and (2) in-code created schema (your fix)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

akstron and others added 4 commits November 26, 2024 22:40
Signed-off-by: Alok Kumar Singh <[email protected]>
Co-authored-by: Yuri Shkuro <[email protected]>
Signed-off-by: Alok Kumar Singh <[email protected]>
Co-authored-by: Yuri Shkuro <[email protected]>
Signed-off-by: Alok Kumar Singh <[email protected]>
Co-authored-by: Yuri Shkuro <[email protected]>
Signed-off-by: Alok Kumar Singh <[email protected]>
// newSessionPrerequisites creates tables and types before creating a session
func (c *Configuration) newSessionPrerequisites() error {
cfg := *c // clone because we need to connect without specifying a keyspace
cfg.Schema.Keyspace = ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make fmt

Signed-off-by: Alok Kumar Singh <[email protected]>
Signed-off-by: Alok Kumar Singh <[email protected]>
line = line[0:commentIndex]
}

if len(line) == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would make sense to trim spaces before checking for len=0.

Comment on lines +90 to +92
if len(queryString) > 0 {
return nil, errors.New(`invalid template`)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do it in generateSchemaIfNotPresent against casQueries, not here.

return casQueries, nil
}

func generateSchemaIfNotPresent(session cassandra.Session, cfg *Schema) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are too many functions in this file that are polluting the overall package namespace. I would prefer to introduce a helper struct

type schemaCreator struct {
  session cassandra.Session
  cfg *Schema
}

and define those functions on that struct (and minimize parameter passing)

Signed-off-by: Alok Kumar Singh <[email protected]>

c.Schema.Keyspace = ""

session, err := createSession(c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
session, err := createSession(c)
session, err := createSession(cfg)

return err
}

return generateSchemaIfNotPresent(session, &c.Schema)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return generateSchemaIfNotPresent(session, &c.Schema)
return generateSchemaIfNotPresent(session, &cfg.Schema)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure about this one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create database schema in Cassandra automatically
3 participants